Discovering Cache Partitioning Optimizations for the K Computer
نویسندگان
چکیده
The processor architecture available on the K computer (SPARC64 VIIIfx) features an hardware cache partitioning mechanism called sector cache. This facility enables software to split the memory cache in two independent sectors: data loads in one sector cannot trigger the eviction of data in the second one. Moreover, software is responsible for data placement in each sector by issuing special instructions tagging the various memory loads performed during execution. The implementation details of this cache partitioning mechanism also enable fast redistribution of the cache during an application’s runtime, without any cost, allowing any optimization using the sector cache to be applied multiple times, with different setups, in the event of phase changes. Unfortunately, in its current state, the compilers provided on the K Computer do not implement any automatic optimization using this cache facility. In the contrary, the only high-level interface to this mechanism is a set of directive to instruct the compiler to generate tagging instructions over a code region. Thus, only application programmers with intricate knowledge of both the memory access patterns of their code and the K Computer architecture can take advantage of this facility. To address this issue and to study new optimization schemes using cache partitioning, we present in this paper a framework using binary instrumentation and reuse distance analysis to discover the locality of important data structures in an application and to suggest appropriate data distribution schemes for the sector cache. These optimizations are then translated into calls to the source-level API provided by the K Computer compilers. We applied our framework to analyze and optimize a set of HPC benchmarking applications and demonstrate significant performance improvements.
منابع مشابه
Partitioning Inverted Lists for Efficient Evaluation of Set-Containment Joins in Main Memory
We present an algorithm for efficient processing of set-containment joins in main memory. Our algorithm uses an index structure based on inverted files. We focus on improving performance of the algorithm in a main-memory environment by utilizing the L2 CPU cache more efficiently. To achieve this, we employ some optimizations including partitioning the inverted lists and compressing the intermed...
متن کاملData Caches in Multitasking Hard Real-Time Systems
Data caches are essential in modern processors, bridging the widening gap between main memory and processor speeds. However, they yield very complex performance models, which makes it hard to bound execution times tightly. This paper contributes a new technique to obtain predictability in preemptive multitasking systems in the presence of data caches. We explore the use of cache partitioning, d...
متن کاملArray Data Layout for the Reduction of Cache Conflicts
The performance of applications on large-scale shared-memory multiprocessors depends to a large extent on cache behavior. Cache conflicts among array elements in loop nests degrade performance and reduce the effectiveness of locality-enhancing optimizations. In this paper, we describe a new technique for reducing cache conflict misses. The technique, called cache partitioning, logically divides...
متن کاملSynergy: A Hypervisor Managed Holistic Caching System
Efficient system-wide memory management is an important challenge for over-commitment based hosting in virtualized systems. Due to the limitation of memory domains considered for sharing, current deduplication solutions simply cannot achieve system-wide deduplication. Popular memory management techniques like sharing and ballooning enable important memory usage optimizations individually. Howev...
متن کاملToward Automated Cache Partitioning for the K Computer
The processor architecture available on the K computer (SPARC64VIIIfx) features an hardware cache partitioning mechanism called sector cache. This facility enables software to split the memory cache in two independent sectors and to select which one will receive a line when it is retrieved from memory. Such control over the cache by an application enables significant performance optimization op...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013